3 research outputs found

    Contribuciones a la aplicación de la factorización de matrices no negativas a las tecnologías del habla

    Get PDF
    El funcionamiento de los sistemas de procesamiento y clasificación de audio (incluida la voz) en escenarios reales, depende, en gran medida, de una adecuada representación de la señal de audio, tanto en condiciones limpias como ruidosas. Por este motivo, en esta Tesis abordamos la problemática del diseño de nuevos esquemas de preprocesamiento y extracción de características acústicas con aplicación a dos tareas distintas: reconocimiento automático del habla y clasificación de eventos acústicos. El nexo de unión de los métodos propuestos es la utilización de la técnica denominada factorización de matrices no negativas (NMF, Non-Negative Matrix Factorization) que ha demostrado ser una herramienta poderosa para el análisis de la señal de audio. En primer lugar, en este trabajo de tesis se propone un método de eliminación de ruido en señales de voz basado en NMF, que, a diferencia de otras aproximaciones previas, no asume un conocimiento a priori acerca de la naturaleza del ruido. La técnica es evaluada tanto para mejora de voz como para reconocimiento automático de habla mostrando un mejor funcionamiento que la técnica convencional de sustracción espectral. En segundo lugar, se proponen tres parametrizaciones novedosas para la tarea de clasificación de eventos acústicos. La primera de ellas es una extensión de los parámetros convencionales mel-cepstrales y consiste en el filtrado paso alto de la señal de audio. El segundo esquema consiste en una mejora de la técnica de integración temporal de características llamada coeficientes de banco de filtros (FC, Filter bank Coe_cients) en el que NMF se utiliza como método no supervisado para el aprendizaje del banco de filtros FC óptimo. Finalmente, en el último nuevo parametrizador se propone la inclusión de características cepstrales derivadas de los coeficientes de activación o ganancia de NMF, motivada por la robustez al ruido que NMF ofrece. Los experimentos realizados muestran que, en términos generales, estos tres esquemas mejoran el funcionamiento del sistema de clasificación de eventos acústicos con respecto al de referencia tanto en condiciones limpias como ruidosas.In real scenarios, the performance of audio processing and classiffication systems depends largely on an adequate representation of the signal in both clean and noisy conditions. Therefore, in this Thesis we face the problem of designing new methods to preprocess audio signals and extract acoustic features with the intention of being applied to two different tasks: Automatic Speech Recognition (ASR) and Acoustic Event Classification (AEC). The proposed methods are based on the well-known Non-Negative Matrix Factorization (NMF) technique, which has proven to be a powerful tool for analyzing audio signals. Firstly, a method for speech denoising is proposed, that unlike other previous approaches it does not assume a prior knowledge about the nature of the kind of noise. The method is evaluated for both, speech enhancement and ASR, showing better performance than one of the state of art techniques known as Spectral Subtraction (SS). Secondly, we propose three new parameterization schemes for AEC. The first one is an extension of the conventional Mel Frequency Cepstral Coefficients (MFCC) and can be seen as a high-pass filtering of the audio signal. The second scheme is an improvement of the temporal feature integration technique named Filterbank Coefficients (FC), in which the NMF technique is used in an unsupervised manner, allowing to discover an optimal FC Filterbank. Finally, the last new parameterization scheme proposes the use of cepstral features derived from the NMF activation coefficients; this is mainly motivated by the robustness shown by NMF in noisy conditions. Experiments have shown that, in general terms, these three feature extraction modules improve the performance of the acoustic event classification systems with respect to the baseline based on MFCC, for both, clean and noisy conditions with different noises at different signal-to-noise ratio (SNR) levels.Programa de Doctorado en Multimedia y ComunicacionesPresidente: Javier Macías Guarasa.- Secretario: Carmen Peláez Moreno.- Vocal: Rubén San Segundo Hernánde

    NMF-based temporal feature integration for acoustic event classification

    Get PDF
    Proceedings of: 14th Annual Conference of the International Speech Communication Association. Lyon, France, 25-29 August 2013.In this paper, we propose a new front-end for Acoustic Event Classification tasks (AEC) based on the combination of the temporal feature integration technique called Filter Bank Coefficients (FC) and Non-Negative Matrix Factorization (NMF). FC aims to capture the dynamic structure in the short-term features by means of the summarization of the periodogram of each short-term feature dimension in several frequency bands using a predefined filter bank. As the commonly used filter bank has been devised for other tasks (such as music genre classification), it can be suboptimal for AEC. In order to overcome this drawback, we propose an unsupervised method based on NMF for learning the filters which collect the most relevant temporal information in the short-time features for AEC. The experiments show that the features obtained with this method achieve significant improvements in the classification performance of a Support Vector Machine (SVM) based AEC system in comparison with the baseline FC features.This work has been partially supported by the Spanish Government grants TSI-020110-2009-103, IPT-120000-2010-24 and TEC2011-26807Publicad

    Feature extraction based on the high-pass filtering of audio signals for Acoustic Event Classification

    Get PDF
    In this paper, we propose a new front-end for Acoustic Event Classification tasks ( AEC). First, we study the spectral characteristics of different acoustic events in comparison with the structure of speech spectra. Second, from the findings of this study, we propose a new parameterization for AEC, which is an extension of the conventional Mel-Frequency Cepstral Coefficients ( MFCC) and is based on the high pass filtering of the acoustic event signal. The proposed front-end have been tested in clean and noisy conditions and compared to the conventional MFCC in an AEC task. Results support the fact that the high pass filtering of the audio signal is, in general terms, beneficial for the system, showing that the removal of frequencies below 100-275 Hz in the feature extraction process in clean conditions and below 400-500 Hz in noisy conditions, improves significantly the performance of the system with respect to the baseline.This work has been partially supported by the Spanish Government grants IPT-120000-2010-24 and TEC2011-26807. Financial support from the Fundación Carolina and Universidad Católica San Pablo, Arequipa.Publicad
    corecore